Introduction

The purpose of this tutorial is to get you comfortable with editing code within the RStudio interface and to familiarize you with a few functions that are helpful for visualizing data. I’m assuming you’ve already installed R and RStudio on your computer, but that you have minimal prior experience with programming in any language.

My goal isn’t for you to leave this session knowing how to do everything your would like to do. My goal is for you to leave this session with questions that you’re excited to figure out the answers to.

The GIS interface

Before we get started, let’s take a little tour of the RStudio interface:

  • Locations of panes and menus
  • Writing code in the console
  • Writing code in a script
  • Writing code in an RMarkdown file

Start a new RMarkdown file to follow along with the rest of this session.

Loading packages

Most of the functionality of R comes from packages. There are over 10,000 R packages available for a variety of uses, and some are quite specialized. Today, we will be using

  • tidyverse: a collection of packages that’s really useful for data visualization and data cleaning
  • sf: a set of tools for working with spatial data consistent with tidyverse methods
  • spData: a bunch of datasets for people to practice spatial data analysis
  • ggthemes: some nice themes you can use to quickly customize figures
  • ggspatial: some extra tools for creating attractive and useful maps.

You will need to install these packages if you haven’t already. Regardless of whether they’re already installed, you’ll need to load them using the library() function.

library(tidyverse)
library(sf)
library(spData)
library(ggthemes)
library(ggspatial)

Loading data

The house dataset is included with the spData package.

I’m going to load the dataset and save it to a data frame called homes. The st_as_sf() function converts the data into the sf format.

homes <- st_as_sf(house)

Once the data is loaded, you’ll see it in your environment tab. It contains data on 25,357 single family homes sold in Lucas County, Ohio between 1993 and 1998, based on data from the county auditor (this data is taken from the James P. LeSage’s Spatial Econometrics Toolbox for Matlab. The dataset includes the following variables:

  • price: selling price
  • yrbuilt: year built
  • stories: one of seven values: one, bilevel, multilvl, one+half, two, two+half, or three
  • TLA: total living area in square feet
  • Wall: one of seven values: stucdrvt, ccbtile, metlvnyl, brick, stone, wood, partbrk
  • beds: number of bedrooms
  • baths number of full bathrooms
  • halfbaths: number of half bathrooms
  • frontage: frontage in feet
  • depth: depth in feet
  • garage: one of four values: no garage, basement, attached, detached, carport
  • garagesqft: garage size in square feet, or zero for no garage
  • rooms: number of rooms
  • lotsize: lotsize in square feet
  • sdate: sale date in format, yymmdd, e.g., Oct 17, 1997 = 971017
  • avalue: auditor’s assessed value
  • s1993: 1 indicated the home was sold in 1993, otherwise 0
  • s1994: 1 indicated the home was sold in 1994, otherwise 0
  • s1995: 1 indicated the home was sold in 1995, otherwise 0
  • s1996: 1 indicated the home was sold in 1996, otherwise 0
  • s1997: 1 indicated the home was sold in 1997, otherwise 0
  • s1998: 1 indicated the home was sold in 1998, otherwise 0
  • syear: Year the home was sold
  • age: Age of the home in 1999, divided by 100

Exercise: Click on the name of the dataset in your environment tab or type View(homes) in your console to view the data a spreadsheet format.

Basic data visualization

The ggplot package is part of tidyverse, which was developed by Hadley Wickham. It offers a powerful set of tools for visualizing data, using an approach Wickham refers to as “a layered grammar of graphics,” which lets you create graphics using layers of commands.

Simple scatter plot

For the first “layer,” you’ll call the ggplot() function, which sets up the the plot and indicates that we’re working with the homes dataset we’ve loaded. Then you can add a layer to represent the data using geom_point(). You’ll need to specify which variable you’ll represent on the x-axis and which variable you’ll represent on the y-axis.

We’ll make a quick scatterplot showing the year the home was built on the x-axis and the price of the home on the y-axis.

ggplot(homes) +
  geom_point(aes(x = yrbuilt, y = price))

Exercise: In your own RStudio session, experiment with plotting other pairs of variables.

Additional variables

I can represent additional variables with color and size. For example, I might have different colors represent different types of buildings (the stories variable).

ggplot(homes) +
  geom_point(aes(x = yrbuilt, y = price, color = stories))

And I might use the sizes of the points to represent each home’s square footage (the TLA variable).

ggplot(homes) +
  geom_point(aes(x = yrbuilt, y = price, color = stories, size = TLA))

It’s sort of hard to see what’s going on with all those dots on top of each other, so you might want to make them all a little bit transparent using the alpha argument.

Note: Characteristics that represent variables should go inside the aes() function, and characteristics you want to apply to all variables should go outside the aes() function.

ggplot(homes) +
  geom_point(aes(x = yrbuilt, 
                 y = price, 
                 color = stories, 
                 size = TLA),
             alpha = 0.25)

Exercise: In your own RStudio session, experiment with representing two or more of these variables in a variety of different ways.

If you want to go way beyond simple scatterplots, feel free to peruse the ggplot cheat sheet for inspiration.

Themes

You might also apply a theme to your scatterplot if you don’t like the default appearance. Here is the same scatterplot using a more minimalist theme.

ggplot(homes) +
  geom_point(aes(x = yrbuilt, 
                 y = price, 
                 color = stories, 
                 size = TLA),
             alpha = 0.25) +
  theme_bw()

And here it is using a theme inspired by plots that appear in the Wall Street Journal.

ggplot(homes) +
  geom_point(aes(x = yrbuilt, 
                 y = price, 
                 color = stories, 
                 size = TLA),
             alpha = 0.25) +
  theme_wsj()

Exercise: Experiment with applying a few different themes to your scatterplot.

Exercise: Create an attractive scatterplot using the house dataset and share it with a small group of three or four classmates.

Displaying Data on a Map

A scatter plot is useful for showing how two or more variables relate to one another, but we may also be interested in how they vary across space. This dataset includes spatial information, so we can plot it on a map using geom_sf().

Let’s create a map showing how the price of a single-family home varies across space.

ggplot(homes) +
  geom_sf(aes(color = price), alpha = 0.5)

An appropriate theme for many maps is theme_map.

ggplot(homes) +
  geom_sf(aes(color = price), alpha = 0.5) + 
  theme_map()

You might want to orient your viewer using a basemap. The ggspatial package has a few you can choose from. Here’s the default.

ggplot(homes) +
  annotation_map_tile(zoomin = 0, progress = "none") +
  geom_sf(aes(color = price), alpha = 0.5) + 
  theme_map()

I also really like the black and white Stamen basemap.

ggplot(homes) +
  annotation_map_tile(zoomin = 0, progress = "none", type = "stamenbw") +
  geom_sf(aes(color = price), alpha = 0.5) + 
  theme_map()

annotation_map_tile() is a function from the ggspatial package that brings in base map images from Open Street Maps. You can see a list of available base maps using rosm::osm.types()

rosm::osm.types()
##  [1] "osm"                    "opencycle"              "hotstyle"              
##  [4] "loviniahike"            "loviniacycle"           "hikebike"              
##  [7] "hillshade"              "osmgrayscale"           "stamenbw"              
## [10] "stamenwatercolor"       "osmtransport"           "thunderforestlandscape"
## [13] "thunderforestoutdoors"  "cartodark"              "cartolight"

You can modify the appearance of the points on a map the same way you would for a scatterplot.

Exersize: Create an interesting map based on the house dataset and share it with a group of three or four classmates.